df =as.data.frame(ortho, xy =TRUE, na.rm =FALSE)idx =which(complete.cases(df))## without data scaling X and Y have more influence on the results in kmeansdf_omit =scale(df[idx, ])# df_omit = df[idx, ]
Performing a k-means clustering and smoothing the results:
mdl =kmeans(df_omit, 100)
Warning: did not converge in 10 iterations
vec =rep(NA_integer_, ncell(ortho))vec[idx] = mdl$clusterrcl =rast(ortho, nlyrs =1, vals = vec)rcl =focal(rcl, w =5, fun ="modal") # smooth
My comments below are very brief (and thus omitting many details):
Have you read https://doi.org/10.1016/j.jag.2022.102935? I think it may explain some things…
Supercells/superpixels usually aim not to create segments (per se), but rather to create groups of fairly homogeneous cells that could be merged later.
K-means vs SLIC: the superpixels SLIC method is sometimes described as a spatially constrained k-means. Why this approach is useful:
we are not looking at all of the pixels while creating a single cluster; and thus, it makes our calculations more efficient.
it gives as more control over the output number of polygons (and/or their sizes).
K-means vs supercells: K-means use the Euclidean distance only; also it calculates centroids as averages of the values. The supercells approach is more flexible: it allows to select one of many distance measures and averaging functions. The example is the vignette is (hopefully) easy to understand, but the actual power of this approach is when you have many unrelated raster layers.
Supercells vs hierarchical clustering: Hierarchical clustering requires calculating a dissimilarity matrix. Thus, for example, if you have 10000 by 10000 raster, then you would need to fit a 10000 by 10000 matrix into your computer memory. Now try to think of an even larger raster…
As you may see above: k-means segments are less homogeneous, and you have less control over their numbers. They also often provide very small and very large polygons. They also “grow” into NA data.
The supercells package works on fairly large data (given some limitations), see https://github.com/Nowosad/supercells/issues/10#issuecomment-962447901
Supercells have a few important arguments, including compactness and clean. Deciding on the best compactness value is not always easy: one of the possible approaches is to disable cleaning and then test a few compactness values on some smaller areas. See https://github.com/Nowosad/supercells/issues/21#issuecomment-1339728555
Comments
My comments below are very brief (and thus omitting many details):
K-means vs supercells: K-means use the Euclidean distance only; also it calculates centroids as averages of the values. The supercells approach is more flexible: it allows to select one of many distance measures and averaging functions. The example is the vignette is (hopefully) easy to understand, but the actual power of this approach is when you have many unrelated raster layers.
Supercells vs hierarchical clustering: Hierarchical clustering requires calculating a dissimilarity matrix. Thus, for example, if you have 10000 by 10000 raster, then you would need to fit a 10000 by 10000 matrix into your computer memory. Now try to think of an even larger raster…
As you may see above: k-means segments are less homogeneous, and you have less control over their numbers. They also often provide very small and very large polygons. They also “grow” into NA data.
The supercells package works on fairly large data (given some limitations), see https://github.com/Nowosad/supercells/issues/10#issuecomment-962447901
Supercells have a few important arguments, including
compactnessandclean. Deciding on the bestcompactnessvalue is not always easy: one of the possible approaches is to disable cleaning and then test a few compactness values on some smaller areas. See https://github.com/Nowosad/supercells/issues/21#issuecomment-1339728555